SemanticScuttle - klotz.me » Tags: deep learning+machine learning

Tags: deep learning* + machine learning*

0 bookmark(s) - Sort by: Date ↓ / Title /

A Beginner’s Reading List for Large Language Models for 2026

A curated reading list for those starting to learn about Large Language Models (LLMs), covering foundational concepts, practical applications, and future trends, updated for 2026.

2026-02-06 Tags: llm, machine learning, deep learning, nlp, reading list, 2026 by klotz

Mechanistic Interpretability: Peeking Inside an LLM

This article explores the field of mechanistic interpretability, aiming to understand how large language models (LLMs) work internally by reverse-engineering their computations. It discusses techniques for identifying and analyzing the functions of individual neurons and circuits within these models, offering insights into their decision-making processes.

2026-02-06 Tags: llm, mechanistic interpretability, visualization, reverse engineering, neural networks, interpretability, machine learning by klotz

Zhipu AI Releases GLM-4.7-Flash: A 30B-A3B MoE Model for Efficient Local Coding and Agents

Zhipu AI has released GLM-4.7-Flash, a 30B-A3B MoE model designed for efficient local coding and agent applications. It offers strong coding and reasoning performance with a 128k token context length and supports English and Chinese.

2026-01-22 Tags: llm, glm-4.7-flash, zhipu ai, moe, coding, agents, machine learning, deep learning, local deployment by klotz

Ministral 3

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving.

2026-01-20 Tags: language models, machine learning, deep learning, llm mistral, arxiv by klotz

How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile

This blog post details how to implement high-performance matrix multiplication using NVIDIA cuTile, focusing on Tile loading, computation, storage, and block-level parallel programming. It also covers best practices for Tile programming and performance optimization strategies.

2026-01-17 Tags: cuda, cutile, matrix multiplication, gpu, performance optimization, tile programming, deep learning, parallel programming by klotz

The Physics of mHC: Why Deep Learning Needs Energy Conservation

This article presents a compelling argument that the Manifold-Constrained Hyper-Connections (mHC) method in deep learning isn't just a mathematical trick, but a fundamentally physics-inspired approach rooted in the principle of energy conservation.

The author argues that standard neural networks act as "active amplifiers," injecting energy and potentially leading to instability. mHC, conversely, aims to create "passive systems" that route information without creating or destroying it. This is achieved by enforcing constraints on the weight matrices, specifically requiring them to be doubly stochastic.

The derivation of these constraints is presented from a "first principles" physics perspective:

* **Conservation of Signal Mass:** Ensures the total input signal equals the total output signal (Column Sums = 1).
* **Bounding Signal Energy:** Prevents energy from exploding by ensuring the output is a convex combination of inputs (non-negative weights).
* **Time Symmetry:** Guarantees energy conservation during backpropagation (Row Sums = 1).

The article also draws a parallel to Information Theory, framing mHC as a way to combat the Data Processing Inequality by preserving information through "soft routing" – akin to a permutation – rather than lossy compression.

Finally, it explains how the Sinkhorn-Knopp algorithm is used to enforce these constraints, effectively projecting the network's weights onto the Birkhoff Polytope, ensuring stability and adherence to the laws of thermodynamics. The core idea is that a stable deep network should behave like a system of pipes and valves, routing information without amplifying it.

2026-01-14 Tags: mhc, deep learning, physics, energy conservation, doubly stochastic matrices, sinkhorn-knopp algorithm, information theory, neural networks, deep seek, llm by klotz

Neural Network Example Code

This Python code demonstrates a neural network application on a CircuitPython board, utilizing a camera (OV7670) for image capture, preprocessing, and inference using a digit classifier. It includes image conversion, auto-cropping, and normalization steps.

2026-01-13 Tags: circuitpython, neural network, image classification, ov7670, digit recognition, machine learning, embedded systems, camera, ulab, github adafruit, code2k13 by klotz

Bringing TensorFlow & PyTorch Models to CircuitPython

Train your neural network in TensorFlow or PyTorch, and run it inside CircuitPython using a single line of Python code.

2026-01-13 Tags: circuitpython, tensorflow, pytorch, machine learning, tinyml, adafruit, python, hardware, esp32 by klotz

The Optimal Architecture for Small Language Models

This article details research into finding the optimal architecture for small language models (70M parameters), exploring depth-width tradeoffs, comparing different architectures, and introducing Dhara-70M, a diffusion model offering 3.8x faster throughput with improved factuality.

2025-12-27 Tags: llm, nlp, small language models, architecture, diffusion, llama, gemma, deep learning by klotz

How LLM Inference Works

A deep dive into the process of LLM inference, covering tokenization, transformer architecture, KV caching, and optimization techniques for efficient text generation.

2025-11-26 Tags: llm, inference, transformer, tokenization, kv cache, quantization, deep learning, machine learning, neural networks by klotz

First / Previous / Next / Last / Page 1 of 0

SemanticScuttle - klotz.me

Tags: deep learning* + machine learning*

Linked Tags

Related Tags